CUDA 프로그래밍 가이드: CUDA 커널 개발의 기초

CUDA 커널 개발은 다음을 정의하는 것으로 시작됩니다: 커널는 병렬적으로 실행되도록 설계된 특수한 C++ 함수로, NVIDIA GPU의 막대한 코어 수를 통해 병렬로 실행될 수 있도록 설계되어 있습니다. 이 함수들은 CUDA 프로그래밍 모델에서 작업의 기본 단위이며, 시리얼 호스트 로직이 대규모 병렬 장치 실행으로 전환되는 다리 역할을 합니다.

1. global 지정자

그리고 __global__ 선언 지정자는 컴파일러가 GPU용 코드를 생성하면서도 함수 진입점이 CPU에 보이게 유지하도록 지시하는 필수적인 API 지정자입니다. 호스트에서 호출할 수 있는 GPU에서 실행되는 함수는 커널이라고 부릅니다.

2. 실행 환경

커널은 스트리밍 멀티프로세서(SMs)에 배포되고 실행됩니다. SM은 병렬 스레드 수백 개를 관리하는 니비디아 GPU 내부의 주요 계산 엔진입니다. 각각의 SM은 스레드 블록을 처리하고 이를 처리 코어에 스케줄링합니다.

문법 규칙: 커널은 반드시 void을 반환해야 합니다. 호스트와 비동기적으로 작동하기 때문에 직접 값을 CPU로 반환할 수 없으며, 결과를 할당된 디바이스 메모리에 다시 써야 합니다.

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the primary function of the __global__ specifier?

It defines a function that runs on the CPU but is callable from the GPU.

It defines a kernel that runs on the GPU and is callable from the CPU.

It allocates memory on the GPU's SM cache.

It synchronizes all threads in a block.

✅ Correct!

Correct! __global__ is the bridge used to launch kernels from Host code.

❌ Incorrect

Incorrect. __global__ specifically identifies entry-point kernels for GPU execution called by the Host.

QUESTION 2

Why must CUDA kernels return void?

Because they execute asynchronously and have no direct path to return values to the Host thread.

To save registers on the SM.

Because GPU memory is read-only.

The NVCC compiler does not support float returns.

QUESTION 3

Which hardware component is responsible for managing and executing threads in a CUDA kernel?

The PCIe Controller.

The Streaming Multiprocessor (SM).

The Host RAM controller.

The BIOS.

QUESTION 4

What happens when a Host calls a kernel function?

The CPU halts until the GPU finish processing.

The GPU creates a clone of the function for every available SM.

The kernel is enqueued for execution on the GPU, and the CPU continues to the next instruction.

The CPU performs a context switch to the GPU.

QUESTION 5

Which of the following is the correct definition of a CUDA kernel?

A function that executes on the GPU and is invoked from the Host.

A C++ library for file I/O.

A hardware driver for NVIDIA GPUs.

A standard CPU function with the __gpu__ prefix.

1. __global__ 지정자

2. 실행 환경

1. global 지정자